Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Better SCSI/SAS support, and removing confused metrics #168

Merged
merged 2 commits into from
Nov 20, 2023

Conversation

robbat2
Copy link
Contributor

@robbat2 robbat2 commented Oct 16, 2023

Implement better SCSI/SAS support, including less confused metrics.

The exporter, prior to this PR, exports a value of "0" for some metrics
that were specific to certain types of drives; these metrics will no
longer be exported for types where that metric is not valid. Future work
may include parsing the corresponding metrics for SATA/SAS SSDs.

Metrics no longer exported for the wrong type of drive:

  • smartctl_device_nvme_capacity_bytes (NVME-specific)
  • smartctl_device_available_spare (NVME-specific, ATA possible)
  • smartctl_device_available_spare_threshold (NVME-specific, ATA
    possible)
  • smartctl_device_critical_warning (NVME-specific, ATA possible)
  • smartctl_device_interface_speed (ATA-specific)
  • smartctl_device_media_errors (NVME-specific, ATA possible)
  • smartctl_device_num_err_log_entries (NVME-specific, SCSI uses distinct
    metrics, ATA possible)
  • smartctl_device_nvme_capacity_bytes (NVME-specific)
  • smartctl_device_percentage_used (NVME-specific, ATA possible)

Fix the following metrics that were exported as zero because the
exporter did not know how to read them for SCSI devices:

  • smartctl_device_bytes_read
  • smartctl_device_bytes_written
  • smartctl_device_power_cycle_count

New metrics:

  • smartctl_read_errors_corrected_by_eccdelayed
  • smartctl_read_errors_corrected_by_eccfast
  • smartctl_write_errors_corrected_by_eccdelayed
  • smartctl_write_errors_corrected_by_eccfast

Fix labels:

  • smartctl_device{model_name} is now populated for SCSI/SAS, based on
    scsi_model_name.

New labels:

  • smartctl_device{} gains:
    scsi_product,scsi_revision,scsi_vendor,scsi_version

Signed-off-by: Robin H. Johnson [email protected]

@robbat2
Copy link
Contributor Author

robbat2 commented Oct 16, 2023

Query:

  1. Do you need this split more
  2. Is there a JSON test corpus that gets published for testing somewhere?

The exporter presently has metrics that are nonsense for a given type of
drive, and remain at zero due to their defaults.

Change the behavior to NOT emit a metric if the underlying JSON field is
not present.

Future related work may include parsing the corresponding metrics for
SATA/SAS SSDs (e.g. `smartctl_device_percentage_used` could derived from
`SSD_Life_Left` on some drives).

Metrics no longer exported for the wrong type of drive:
- `smartctl_device_nvme_capacity_bytes` (NVME-specific)
- `smartctl_device_available_spare` (NVME-specific, ATA possible)
- `smartctl_device_available_spare_threshold` (NVME-specific, ATA
  possible)
- `smartctl_device_critical_warning` (NVME-specific, ATA possible)
- `smartctl_device_interface_speed` (ATA-specific)
- `smartctl_device_media_errors` (NVME-specific, ATA possible)
- `smartctl_device_num_err_log_entries` (NVME-specific, SCSI uses
  distinct metrics, ATA possible)
- `smartctl_device_nvme_capacity_bytes` (NVME-specific)
- `smartctl_device_percentage_used` (NVME-specific, ATA possible)

Signed-off-by: Robin H. Johnson <[email protected]>
Fix the following metrics that were exported as zero because the
exporter did not know how to read them for SCSI devices:
- smartctl_device_bytes_read
- smartctl_device_bytes_written
- smartctl_device_power_cycle_count

New metrics:
- smartctl_read_errors_corrected_by_eccdelayed
- smartctl_read_errors_corrected_by_eccfast
- smartctl_write_errors_corrected_by_eccdelayed
- smartctl_write_errors_corrected_by_eccfast

Fix labels:
- smartctl_device{model_name} is now populated for SCSI/SAS, using
  scsi_model_name.

New labels:
- smartctl_device{} gains:
  scsi_product,scsi_revision,scsi_vendor,scsi_version

Signed-off-by: Robin H. Johnson <[email protected]>
@robbat2
Copy link
Contributor Author

robbat2 commented Nov 2, 2023

@SuperQ can you review and merge please?

@NiceGuyIT
Copy link
Member

  1. Is there a JSON test corpus that gets published for testing somewhere?

There isn't one but it would be nice to start building one. I don't have any SCSI/SAS devices so I can't really test this PR. Can you attach the smartctl output as a file, or better yet, submit it as another PR. Use the same flags as the program.

@robbat2
Copy link
Contributor Author

robbat2 commented Nov 8, 2023

@NiceGuyIT Can we consider a formal consistent script that redacts sensitive info in the JSON?

Partial list:

.smartctl.platform_info
.smartctl.build_info
.local_time.time_t # redact to consistent value
.local_time.asctime  # redact to consistent value
.serial_number
.logical_unit_id
.wwn.id
.firmware_version # some organizations might have custom firmware

I've got 20+ models of drive over technologies & vendors from my personal access, counting before $job drives.

SCSI
SATA
SAS
NVME

@robbat2
Copy link
Contributor Author

robbat2 commented Nov 8, 2023

#175 contains the test data suggestion.

@SuperQ SuperQ requested a review from NiceGuyIT November 20, 2023 15:32
Copy link
Contributor

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@NiceGuyIT NiceGuyIT merged commit 1f56220 into prometheus-community:master Nov 20, 2023
1 check passed
SuperQ added a commit that referenced this pull request Mar 3, 2024
* [CHANGE] Better SCSI/SAS support, and removing confused metrics #168
* [ENHANCEMENT] Impvoe the JSON collection script; now requires jq/yq #176
* [BUGFIX] Shell fixes for `collect-smartctl-json.sh` #178
* [BUGFIX] Various fixes to `collect_fake_json.sh` #159

Signed-off-by: SuperQ <[email protected]>
@SuperQ SuperQ mentioned this pull request Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants